AITopics | quantization value

Collaborating Authors

quantization value

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Optimal and Approximate Adaptive Stochastic Quantization

Neural Information Processing SystemsFeb-17-2026, 08:35:20 GMT

We revisit the Adaptive Stochastic Quantization (ASQ) problem and present algorithms that find optimal solutions with asymptotically improved time and space complexities.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

1dba3025b159cd9354da65e2d0436a31-Paper.pdf

Neural Information Processing SystemsFeb-7-2026, 18:15:06 GMT

Traditionally, federated learning (FL) aims to train a single global model while collaboratively using multiple clients and a server. Two natural challenges that FL algorithms face are heterogeneity in data across clients and collaboration of clients with diverse resources.

artificial intelligence, eqc, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Virginia (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Optimal and Approximate Adaptive Stochastic Quantization

Neural Information Processing SystemsOct-10-2025, 12:48:49 GMT

We revisit the Adaptive Stochastic Quantization (ASQ) problem and present algorithms that find optimal solutions with asymptotically improved time and space complexities.

algorithm, quantization value, quiver, (15 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

PARQ: Piecewise-Affine Regularized Quantization

Jin, Lisa, Ma, Jianhao, Liu, Zechun, Gromov, Andrey, Defazio, Aaron, Xiao, Lin

arXiv.org Artificial IntelligenceMar-19-2025

Modern deep learning models exhibit exceptional vision and language processing capabilities, but come with excessive sizes and demands on memory and computing. Quantization is an effective approach for model compression, which can significantly reduce their memory footprint, computing cost, as well as latency for inference (e.g., Han et al., 2016; Sze et al., 2017). There are two main classes of quantization methods: post-training quantization (PTQ) and quantization-aware training (QAT). Both are widely adopted and receive extensive research--see the recent survey papers (Gholami et al., 2022; Fournarakis et al., 2022) and references therein. PTQ converts the weights of a pre-trained model directly into lower precision without repeating the training pipeline; it thus has less overhead and is relatively easy to apply Nagel et al. (2020); Cai et al. (2020); Chee et al. (2024). However, it is mainly limited to 4 or more bit regimes and can suffer steep performance drops with fewer bits Yao et al. (2022); Dettmers & Zettlemoyer (2023). This is especially the case for transformer-based models, which prove harder to quantize Bai et al. (2021); Qin et al. (2022) compared to convolutional architectures Martinez et al. (2019); Qin et al. (2020). On the other hand, QAT integrates quantization into pre-training and/or fine-tuning processes and can produce low-bit (especially binary) models with mild performance degradation (e.g.

machine learning, natural language, quantization, (18 more...)

arXiv.org Artificial Intelligence

2503.15748

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(4 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LCQ: Low-Rank Codebook based Quantization for Large Language Models

Cai, Wen-Pu, Li, Wu-Jun

arXiv.org Artificial IntelligenceMay-31-2024

Large language models (LLMs) have recently demonstrated promising performance in many tasks. However, the high storage and computational cost of LLMs has become a challenge for deploying LLMs. Weight quantization has been widely used for model compression, which can reduce both storage and computational cost. Most existing weight quantization methods for LLMs use a rank-one codebook for quantization, which results in substantial accuracy loss when the compression ratio is high. In this paper, we propose a novel weight quantization method, called low-rank codebook based quantization (LCQ), for LLMs. LCQ adopts a low-rank codebook, the rank of which can be larger than one, for quantization. Experiments show that LCQ can achieve better accuracy than existing methods with a negligibly extra storage cost.

codebook, post-training quantization, quantization, (16 more...)

arXiv.org Artificial Intelligence

2405.20973

Country: Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Optimal and Near-Optimal Adaptive Vector Quantization

Ben-Basat, Ran, Ben-Itzhak, Yaniv, Mitzenmacher, Michael, Vargaftik, Shay

arXiv.org Artificial IntelligenceFeb-5-2024

Quantization is a fundamental optimization for many machine-learning use cases, including compressing gradients, model weights and activations, and datasets. The most accurate form of quantization is \emph{adaptive}, where the error is minimized with respect to a given input, rather than optimizing for the worst case. However, optimal adaptive quantization methods are considered infeasible in terms of both their runtime and memory requirements. We revisit the Adaptive Vector Quantization (AVQ) problem and present algorithms that find optimal solutions with asymptotically improved time and space complexity. We also present an even faster near-optimal algorithm for large inputs. Our experiments show our algorithms may open the door to using AVQ more extensively in a variety of machine learning applications.

dimension, quantization value, quiver, (13 more...)

arXiv.org Artificial Intelligence

2402.03158

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Add feedback

THC: Accelerating Distributed Deep Learning Using Tensor Homomorphic Compression

Li, Minghao, Basat, Ran Ben, Vargaftik, Shay, Lao, ChonLam, Xu, Kevin, Tang, Xinran, Mitzenmacher, Michael, Yu, Minlan

arXiv.org Artificial IntelligenceFeb-16-2023

Deep neural networks (DNNs) are the de-facto standard for essential use cases, such as image classification, computer vision, and natural language processing. As DNNs and datasets get larger, they require distributed training on increasingly larger clusters. A main bottleneck is then the resulting communication overhead where workers exchange model updates (i.e., gradients) on a per-round basis. To address this bottleneck and accelerate training, a widely-deployed approach is compression. However, previous deployments often apply bi-directional compression schemes by simply using a uni-directional gradient compression scheme in each direction. This results in significant computational overheads at the parameter server and increased compression error, leading to longer training and lower accuracy. We introduce Tensor Homomorphic Compression (THC), a novel bi-directional compression framework that enables the direct aggregation of compressed values while optimizing the bandwidth to accuracy tradeoff, thus eliminating the aforementioned overheads. Moreover, THC is compatible with in-network aggregation (INA), which allows for further acceleration. Evaluation over a testbed shows that THC improves time-to-accuracy in comparison to alternatives by up to 1.32x with a software PS and up to 1.51x using INA. Finally, we demonstrate that THC is scalable and tolerant for acceptable packet-loss rates.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2302.08545

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Washington > King County > Renton (0.04)
North America > Canada > Ontario > Toronto (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Attention Round for Post-Training Quantization

Diao, Huabin, Li, Gongyan, Xu, Shaoyun, Hao, Yuexing

arXiv.org Artificial IntelligenceJul-7-2022

At present, the quantification methods of neural network models are mainly divided into post-training quantization (PTQ) and quantization aware training (QAT). Post-training quantization only need a small part of the data to complete the quantification process, but the performance of its quantitative model is not as good as the quantization aware training. This paper presents a novel quantification method called Attention Round. This method gives parameters w the opportunity to be mapped to all possible quantized values, rather than just the two quantized values nearby w in the process of quantization. The probability of being mapped to different quantified values is negatively correlated with the distance between the quantified values and w, and decay with a Gaussian function. In addition, this paper uses the lossy coding length as a measure to assign bit widths to the different layers of the model to solve the problem of mixed precision quantization, which effectively avoids to solve combinatorial optimization problem. This paper also performs quantitative experiments on different models, the results confirm the effectiveness of the proposed method. For ResNet18 and MobileNetV2, the post-training quantization proposed in this paper only require 1,024 training data and 10 minutes to complete the quantization process, which can achieve quantization performance on par with quantization aware training.

artificial intelligence, attention round, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2207.03088

Country: Asia > China > Beijing > Beijing (0.05)

Genre: Research Report (0.84)

Industry: Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback